15 research outputs found

    Individual identity in songbirds: signal representations and metric learning for locating the information in complex corvid calls

    Get PDF
    Bird calls range from simple tones to rich dynamic multi-harmonic structures. The more complex calls are very poorly understood at present, such as those of the scientifically important corvid family (jackdaws, crows, ravens, etc.). Individual birds can recognise familiar individuals from calls, but where in the signal is this identity encoded? We studied the question by applying a combination of feature representations to a dataset of jackdaw calls, including linear predictive coding (LPC) and adaptive discrete Fourier transform (aDFT). We demonstrate through a classification paradigm that we can strongly outperform a standard spectrogram representation for identifying individuals, and we apply metric learning to determine which time-frequency regions contribute most strongly to robust individual identification. Computational methods can help to direct our search for understanding of these complex biological signals

    Data-Efficient Weakly Supervised Learning for Low-Resource Audio Event Detection Using Deep Learning

    Get PDF
    5 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:1807.03697We propose a method to perform audio event detection under the common constraint that only limited training data are available. In training a deep learning system to perform audio event detection, two practical problems arise. Firstly, most datasets are "weakly labelled" having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose a data-efficient training of a stacked convolutional and recurrent neural network. This neural network is trained in a multi instance learning setting for which we introduce a new loss function that leads to improved training compared to the usual approaches for weakly supervised learning. We successfully test our approach on two low-resource datasets that lack temporal labels

    PENGARUH VARIASI DIMENSI SUDU DAN LUAS SALURAN BUANG TERHADAP PRESTASI TURBIN VORTEX

    Get PDF
    Turbin vortex adalah turbin yang memanfaatkan pusaran air sebagai penggerak sudu turbin dengan head yang rendah dan bisa digunakan pada aliran sungai.Pada penelitian ini digunakan 3 dimensi sudu yang berbeda dengan bentuk casing lingkaran dan memiliki 3 variasi saluran buang dan ketinggian poros dari dasar casing.Penelitian ini dilakukan dengan memanfaatkan pompa sebagai sirkulator air dan menggunakan talang sebagai saluran masuk rumah turbin.Langkah-langkah yang dilakukan dalam penelitian ini meliputi perancangan, pembuatan turbin vortex, dan pengujian torsi turbin.Hasil penelitian menunjukkan bahwa sudu 3,yaitu dengan dimensi tinggi 78,3 cm dan lebar 13,5 cm memiliki efisiensi lebih tinggi dibandingkan dengan sudu yang lainnya dengan menggunakan diameter saluran buang 7cm

    Joint multi-pitch detection and score transcription for polyphonic piano music

    Get PDF
    Research on automatic music transcription has largely focused on multi-pitch detection; there is limited discussion on how to obtain a machine- or human-readable score transcription. In this paper, we propose a method for joint multi-pitch detection and score transcription for polyphonic piano music. The outputs of our system include both a piano-roll representation (a descriptive transcription) and a symbolic musical notation (a prescriptive transcription). Unlike traditional methods that further convert MIDI transcriptions into musical scores, we use a multitask model combined with a Convolutional Recurrent Neural Network and Sequence-to-sequence models with attention mechanisms. We propose a Reshaped score representation that outperforms a LilyPond representation in terms of both prediction accuracy and time/memory resources, and compare different input audio spectrograms. We also create a new synthesized dataset for score transcription research. Experimental results show that the joint model outperforms a single-task model in score transcription

    ACPAS: A Dataset of Aligned Classical Piano Audio and Scores for Audio-to-Score Transcription

    Get PDF
    We create the ACPAS dataset with aligned audio and scores on classical piano music for automatic music audio-to-score transcription research. The dataset contains 497 distinct music scores aligned with 2189 audio performances, 179.8 hours in total. To our knowledge, it is currently the largest dataset for audio-to-score transcription research. We provide aligned performance audio, performance MIDI and MIDI scores, together with beat, key signature, and time signature annotations. The dataset is partly collected from a list of existing automatic music transcription (AMT) datasets, and partly synthesized. Both real recordings and synthetic recordings are included. We provide a train/validation/test split with no piece overlap and in line with splits in other AMT datasets

    Ensemble Models for Spoofing Detection in Automatic Speaker Verification

    Get PDF
    Detecting spoofing attempts of automatic speaker verification (ASV) systems is challenging, especially when using only one modelling approach. For robustness, we use both deep neural networks and traditional machine learning models and combine them as ensemble models through logistic regression. They are trained to detect logical access (LA) and physical access (PA) attacks on the dataset released as part of the ASV Spoofing and Countermeasures Challenge 2019. We propose dataset partitions that ensure different attack types are present during training and validation to improve system robustness. Our ensemble model outperforms all our single models and the baselines from the challenge for both attack types. We investigate why some models on the PA dataset strongly outperform others and find that spoofed recordings in the dataset tend to have longer silences at the end than genuine ones. By removing them, the PA task becomes much more challenging, with the tandem detection cost function (t-DCF) of our best single model rising from 0.1672 to 0.5018 and equal error rate (EER) increasing from 5.98% to 19.8% on the development set

    Deep perceptual embeddings for unlabelled animal sound events

    No full text
    Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds

    Performance MIDI-to-score conversion by neural beat tracking

    No full text
    Rhythm quantisation is an essential part of converting performance MIDI recordings into musical scores. Previous works on rhythm quantisation are limited to the use of probabilistic or statistical methods. In this paper, we propose a MIDI-to-score quantisation method using a convolutional-recurrent neural network (CRNN) trained on MIDI note sequences to predict whether notes are on beats. Then, we expand the CRNN model to predict the quantised times for all beat and non-beat notes. Furthermore, we enable the model to predict the key signatures, time signatures, and hand parts of all notes. Our proposed performance MIDI-to-score system achieves significantly better performance compared to commercial software evaluated on the MV2H metric. We release the toolbox for converting performance MIDI into MIDI scores at: https://github.com/cheriell/PM2
    corecore